OcrV1, Main, Exploration, bibRecord, 000827

Keyword Spotting Techniques for Sanskrit Documents

Identifieur interne : 000827 ( Main/Exploration ); précédent : 000826; suivant : 000828

Keyword Spotting Techniques for Sanskrit Documents

Auteurs : Anurag Bhardwaj [États-Unis] ; Srirangaraj Setlur [États-Unis] ; Venugopal Govindaraju [États-Unis]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.

RBID : ISTEX:2B51A63EE49E0CFD6A863AC393C015BCFEC63152

Abstract

Abstract: With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts and languages such as Sanskrit has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script specific Keyword Spotting for Sanskrit documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script independent Keyword Spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.

Url:

https://api.istex.fr/document/2B51A63EE49E0CFD6A863AC393C015BCFEC63152/fulltext/pdf

DOI: 10.1007/978-3-642-00155-0_22

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000015
to stream Istex, to step Curation: 000015
to stream Istex, to step Checkpoint: 000349
to stream Main, to step Merge: 000835
to stream Main, to step Curation: 000827

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Keyword Spotting Techniques for Sanskrit Documents</title>
<author><name sortKey="Bhardwaj, Anurag" sort="Bhardwaj, Anurag" uniqKey="Bhardwaj A" first="Anurag" last="Bhardwaj">Anurag Bhardwaj</name>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation><country>États-Unis</country>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:2B51A63EE49E0CFD6A863AC393C015BCFEC63152</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-00155-0_22</idno>
<idno type="url">https://api.istex.fr/document/2B51A63EE49E0CFD6A863AC393C015BCFEC63152/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000015</idno>
<idno type="wicri:Area/Istex/Curation">000015</idno>
<idno type="wicri:Area/Istex/Checkpoint">000349</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Bhardwaj A:keyword:spotting:techniques</idno>
<idno type="wicri:Area/Main/Merge">000835</idno>
<idno type="wicri:Area/Main/Curation">000827</idno>
<idno type="wicri:Area/Main/Exploration">000827</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Keyword Spotting Techniques for Sanskrit Documents</title>
<author><name sortKey="Bhardwaj, Anurag" sort="Bhardwaj, Anurag" uniqKey="Bhardwaj A" first="Anurag" last="Bhardwaj">Anurag Bhardwaj</name>
<affiliation><wicri:noCountry code="subField">Amherst</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation><wicri:noCountry code="subField">Amherst</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation><wicri:noCountry code="subField">Amherst</wicri:noCountry>
<country>États-Unis</country>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">2B51A63EE49E0CFD6A863AC393C015BCFEC63152</idno>
<idno type="DOI">10.1007/978-3-642-00155-0_22</idno>
<idno type="ChapterID">22</idno>
<idno type="ChapterID">Chap22</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts and languages such as Sanskrit has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script specific Keyword Spotting for Sanskrit documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script independent Keyword Spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
<settlement><li>Buffalo (New York)</li>
</settlement>
<orgName><li>Université d'État de New York</li>
<li>Université d'État de New York à Buffalo</li>
</orgName>
</list>
<tree><country name="États-Unis"><noRegion><name sortKey="Bhardwaj, Anurag" sort="Bhardwaj, Anurag" uniqKey="Bhardwaj A" first="Anurag" last="Bhardwaj">Anurag Bhardwaj</name>
</noRegion>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000827 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000827 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:2B51A63EE49E0CFD6A863AC393C015BCFEC63152
   |texte=   Keyword Spotting Techniques for Sanskrit Documents
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Keyword Spotting Techniques for Sanskrit Documents

Keyword Spotting Techniques for Sanskrit Documents

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri